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. Abstract 

We show how to efficiently simulate continuous-time quantum query algorithms that run in 
time T in a manner that preserves the query complexity (within a polylogarithmic factor) while 
also incurring a small overhead cost in the total number of gates between queries. By small 



overhead, we mean T within a factor that is polylogarithmic in terms of T and a cost measure 
that reflects the cost of computing the driving Hamiltonian. This permits any continuous-time 
quantum algorithm based on an efficiently computable driving Hamiltonian to be converted into 
a gate-efficient algorithm with similar running time. 

1 Introduction and Summary of Result 

The continuous-time query model [1] can be thought of as a variant of the standard query model, 
but where arbitrarily small A-fractional queries to the data x\X2 . . . xl G {0, 1} L , of the form 
£T} ', \j) 1— > e^^li), can be made at cost only A. In the limit as A — > 0, such algorithms become 

^sO ' continuous-time Hamiltonian evolution processes. 

We show that any continuous-time quantum query algorithm whose total query time is T and 
whose driving Hamiltonian is implementable with G 1- and 2-qubit gates (in a sense defined in 
Section [3]) can be simulated by a discrete-query quantum algorithm using the following resources: 



(N 



0(T log Tj log log T) queries 



• 0(TG log(T) + Tlog 3 (||#||T)) 1- and 2-qubit gates [or 0(TG log(T) + TG 3 ) in terms of just 
T and G] 



• 0(log 3 (||#||T)) qubits of space [or 0{G 3 )}. 

This extends a previous result [2] where the query cost is the same, but where the orders of the 
second and third resource costs are at least T 2 polylog T and Tpolylog T respectively. The present 
result can also be compared with the result [3] where the query cost is superior to ours, OiT) 
(which is asymptotically optimal), but whose methodology does not (as far as we know) yield an 
efficient gate construction from an efficiently implementable driving Hamiltonian. 
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2 Significance to Quantum Computation 



The continuous time query model is an important tool for designing algorithms, and for exam- 
ple yielded the algorithm for AND-OR tree evaluation [JJ. The difficulty with continuous-time 
quantum algorithms is that, in order to implement them on quantum computers, these abstract 
query algorithms need to be translated into concrete algorithms with subroutines substituted for 
the black-box queried- In these circumstances, what matters is the total gate complexity, which 
can be large if the cost of the operations performed between the queries is large, even if the number 
of queries is small. The contribution of our result is that it provides a systematic way to obtain a 
gate-efficient discrete-query algorithm from any continuous-time query algorithm where the driv- 
ing Hamiltonian can be efficiently implemented. That is, whenever the implementation cost of the 
driving Hamiltonian is small, the total gate complexity is not much more than the query complexity 
times the cost of implementing each query. 

Consider applying the continuous-time quantum algorithm in [3] for AND-OR tree evaluation 
to evaluate expressions of the form 

3xi Vx 2 3x 3 • • • Vx L f(xi,x 2 , ...,xl), (1) 

where one is given a polynomial (in L) size circuit implementation of / : {0,1}^ — > {0,1}. This 
corresponds to evaluating a balanced binary AND-OR tree of size N = 2 L . A continuous-time 
query algorithm achieving time 0(yN) cannot be simulated directly from /, because a small 
A-fractional query to / cannot be computed at cost proportional to A; the algorithm must be 
efficiently translated into the discrete-query framework to be implementable. But if we substitute 
the parameters into the simulation in [2j, we obtain a gate cost of order A^polylogA^ (losing the 
square-root speedup) and consume order y/~N polylog N qubits of space. The simulation in [3] does 
not appear to yield any bounds less than 0(N) on the gate cost. However, our present simulation 
results in jV 1 / 2+0 ( 1 ) gates and O(polylogiV) space (using the fact that the driving Hamiltonian 
in [4] can be implemented with gates). We remark that, for this particular example, a better 

simulation that is specific to AND-OR tree evaluation (that was discovered after [3J) is known [5]. 

3 Precise Statement of Main Result 

Prior to stating our main result, we give a precise definition of the implementation cost of a 
Hamiltonian acting on Z qubits, which is the cost of realising the unitary operation corresponding 
to evolution under the Hamiltonian from a start time to a finish time. A preliminary idealised 
definition is as a unitary operation with the following properties. It acts on three registers: a 
start time, a finish time and an Z-qubit state. For any start and finish times t s and tf, and any 
Z-qubit state the unitary operation maps \t s )\tf)\ip) to \t s )\tf)\ip'), where is the state that 
results when evolves under H from time t s to time tf. Assuming that all three registers are 
finite-dimensional, this can be denoted as a gate as in Fig. [TJ We will not require the unitary to be 
implemented perfectly. We introduce a precision parameter e', and permit the unitary evolution to 
be approximated within e' . This leads to the following definition. 

1 A query is typically not something that could be physically implemented directly via continuous-time Hamiltonian 
evolution, as in an analog quantum computer. A query corresponds to the coherent evaluation of a classical function 
on several qubits, and requires several quantum gates to implement, regardless of whether it is a full query or a 
fractional query. 
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Figure 1: Controlled evolution under Hamiltonian H, with start time t s , finish time tf, and target 
state 1^). 

Definition 3.1. Let H be a Hamiltonian acting on I qubits. Define H to be implementable within 
precision e' with G gates if the following unitary operation can be implemented within precision e' 
with G elementary gates. The unitary acts on three registers: a start time and finish time, and I 
qubits set to the initial state. The unitary maps \t s )\tf)\ip) to \t s )\tf)\ijj') , where \tp') is the state 
that results when \ip) evolves under H from time t s to time tf. By approximating within e! , we 
mean with respect to the completely bounded norm. 

We are now ready to state our main result. 

Theorem 3.2. (Main) Let H(t) be a driving Hamiltonian that is approximately implementable 
within precision 1/T using G gates. Then the continuous-time query algorithm can be simulated 
with constant error by a discrete-query quantum algorithm using the following resources: 

• 0(T log Tj log log T) queries 

• 0(TG log(T) +Tlog 3 (|| J H"||T)) 1- and 2-qubit gates 

• 0(log 3 ( ||-ff ||T)) qubits of space. 

In particular, when G is polylog(T), this is 0{T) queries, 0{T) 1- and 2-qubit gates, and 
polylog(T) qubits of space. The norm ||iT|| is taken to be ||i?|| := sup tg [ 0j r] 11-^(011 f° r time- 
dependent H{t). Because the gate complexity scales linearly in G, we require the driving Hamil- 
tonian to be simulatable efficiently in order for the simulation to be gate-efficient. If, for example, 
G scaled linearly in \\H\\, then the gate complexity would be linear in ||i?||T, which is similar 
to the complexity obtained by product formulae [6]. On the other hand, we have a lower bound 
of G = f2(log(||-ff ||T)) (see Section I4.6p . As a result, we could express the gate complexity as 
0(TGlog(T) + TG 3 ), and the number of qubits of space as 0(G 3 ). 

The remaining sections explain our algorithm, with the proof of Theorem 13.21 in Section [H 

4 Compressed CGMSY Construction 

We will summarise the construction in [2], and then show how to make it more efficient by com- 
pressing the control registers. Before doing so, we state the notation used throughout this paper. 
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Figure 2: The construction from Ref. j2] to simulate a time interval of length 1/4. 



Notation. We denote the sets of linear, Hermitian, and unitary operators acting on complex 
Euclidean space X as C(X), H{X), and U{X), respectively. The spectral norm of operator A 
is || A || := max{|| A\v) || 2 : || \v) \\ 2 = 1}- The norm of time dependent operator A{i) is given 
by ||A|| = sup t ||vl(i)||. The completely bounded norm, or diamond norm, of superoperator $ : 



❖ 



l C(X) 



where the superoperator trace-norm is given 

We 



C(X) i->- C(y) is defined as 

by || $ \\ x = max{|| &(X) || tr | X G C(X), \\ X || tr <1}. All logarithms are taken to base 2 
define [m] := {l,...,m}. The tensor product of many zero computational basis states will be 
represented in compact form as |0^) := |0)®^. 



4.1 Overview of the CGMSY Construction [2] 

Our result is obtained by simulating the construction in [2] , but by representing some of the qubits 
in a highly compressed form. This compressed form was known by the authors of [2], but it was 
not known that all of the steps of the construction can be carried out within the compressed 
form — especially the measurement of control qubits. 

The construction in [2] begins with a fractional query algorithm with total query cost T. This 
is partitioned into segments corresponding to time intervals of the form [icb^o + 1/4], and with m 
fractional queries of size l/4m in each such interval. Here, m can be chosen as a power of two 
without loss of generality; we henceforth assume this is the case. In this work we consider the 
simulation of each of these segments. In [2] it is shown that each segment can be simulated by a 
circuit of the form of Fig. [21 where k' G 0(log(T)/loglog(T)), and whose gates are as follows. 

On the first m qubits, collectively called the control register, 

with p » l/^/8m. (2) 

The gates labeled Q are the (full) queries; the register they act on is called the target register. The 
gates Vi)-- - j Vy are the unitaries corresponding to evolving the driving Hamiltonian for various 
time intervals specified by the control qubits: V\ for the time interval from to to the position of the 
first one in the control qubits; V2 for the time interval delineated by the positions of the first and 
second ones in the control qubits; and so on. The simulation is successful if b± = ■ ■ ■ = b m = 0, 
which occurs with probability > 3/4. The value /3 2 = sin(l/8m) ~ l/8m corresponds to a time 
interval of 1/4. This time interval is chosen to ensure that the success probability is > 3/4. 

In the case that the simulation is not successful, then there are errors at times corresponding to 
the bj that are equal to 1. Reference [2] shows how to correct unsuccessful instances. That analysis 
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continues to hold here without modification. The only subtlety is that we also need to account 
for the number of gates needed to perform the gates Vj. Each gate may need to be divided into a 
number of parts corresponding to the number of errors (ones) found. It is shown in Ref. [2j that the 
average number of ones is O(l), so the average number of oracle queries is at most multiplied by a 
constant factor. Moreover, if the total number of oracle queries permitted is bounded by 0(l/e to t) 
times the average value, then by the Markov bound, the probability is at least 1 — O(etot) that 
the overall correction procedure terminates within this bound [2]. Failure to terminate within the 
bound can be included in the £tot allowable error. For the main result in Theorem 13.21 constant 
error is considered, so this does not alter the result. 

When analysing the complexity due to correcting unsuccessful instances, another factor that 
needs to be considered is the additional complexity due to correcting the individual errors. The 
average of this complexity was denoted Co in Ref. [2], but an upper bound was not considered. 
As before, an upper bound equal to 0(l/etot) times the average value will not be exceeded with 
probability 1 — 0(e to t)- Again this does not affect the result in Theorem 13.21 as constant error 
is considered. As a result of these considerations, when taking into account the corrections, the 
number of oracle queries and the number of additional 1- and 2-qubit gates are at most multiplied 
by a constant factor. This means that the correction operations do not alter the scaling, and we 
do not need to consider them further. 

The state of the control registers R® m \0 m } = (a\0) + /3|l))® m is highly "compressible" in that 
most of its amplitude is concentrated on basis states with low Hamming weight. A natural succinct 
representation of this state is in terms of the positions of the ones in binary. We first define such a 
succinct form precisely (Section I4.2p . We then show how the above circuit can be simulated with 
the control qubits in their succinct form in these three stages: the initial stage (Section 14. 3p . which 
is the construction of the state R® m \0 m ); the intermediate stage (Sections 14.41 and 14. 5p . where 
p®m ^ g a ppii ec i to the control qubits and then the queries and driving operations occur; and the 
final stage (Section [5]), which is where the control qubits are measured with respect to the basis 



4.2 Succinct Representation of Control Qubits 

We now propose a succinct encoding scheme which accurately reproduces low Hamming weight 
basis states. Specifically, consider the set of all m-bit strings whose Hamming weight is at most 
k + 1, where k is much smaller than m. The size of this set is bounded above by (m + l) k+1 . 
Our encoding scheme utilizes a set of size (m + l) k+1 strings to accurately represent this space as 
follows. We use the notation |x| to denote the Hamming weight of x £ {0, 1}*. The value of k is 
chosen to ensure that the error due to omitting high Hamming weight components is no more than 
e, and therefore can be taken as 



We also use a slightly smaller value k' to ensure that the error is no more than e'; the relation 
between these primed variables is identical. The Hamming weight cutoff k is used to limit errors 
that occur repeatedly in the compressed measurement protocol. In contrast, Hamming weight 
cutoff k! is used to limit errors that only occur once. In particular, we limit the total number of 
controlled oracle calls to k', because the error due to limiting the Hamming weight there only occurs 
once. We also limit the number of ones that are measured to k! . The Hamming weight cutoff k is 
used in our compressed encoding, as the error due to this cutoff will contribute multiple times. 



{R® m \x) : x G {0,l} m }. 




(3) 
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Definition 4.1. Define the encoding scheme on \x) for x G {0, l} m , |x| < k as follows. For 
x = Sl 10 S2 10 S3 . . .0 Sh 10* ; where h := \x\, h < k + 1 and t = m - si s h - h, 

c m\ x ) = \si,S2,...,s h , rn,...,m) , (4) 

k+l-h 

where C^Jx) G (C m+1 )® fc+1 . For h > k + 1, C^Jx) encodes the positions of the first k + 1 ones. 



4.3 Initialization of Control Qubits in Alternative Encoding 

We now show how to simulate the preparation of the state after operation R® m (but before _p® m ; 
which is deferred to Section T4.5P in succinct form using the encoding of Definition 14.11 To begin, in 
the original circuit there are m control qubits, whose state is initialised to (a|0) + /3|l))® m , where 
/3 G @(l/y/m). The amplitudes of terms in this superposition decrease factorially with Hamming 
weight, and in particular, one can write 

(a|0) +/3|l))® m = a m '\ x \ p\ x \\x) 



xG{0,l} r 



a m " |:r| /3 N |x> + Yl a" 1 "'* 1 / 3 ' 



x] \x) 



x<={o,i} m xe{o,i} m 

|x|<fc |x|>fc 

= a m -^j3^\x}+fj,\u), (5) 

x£{Q,l} m 

I I ^ k 

where, on the last line, \u) is orthogonal to every basis state in the sum that precedes it, and 

/i 2 G l/2 0( - k h\. Using k as in Eq. ©, /j 2 G O(e). 

~<k 
J mi 



Using our encoding scheme C^, we aim to prepare a state of the form 



a m -W(3Wci\x) + n\i/), (6) 

xe{o,iy m 

\x\<k 

for some \u') G (C m+1 )® fc+1 orthogonal to all the kets arising in the sum. This state has the correct 
amplitudes for all encodings of strings with Hamming weights up to k. Because we choose k such 
that n 2 G 0(e), the error in the encoding is 0(e). 

We now show how to construct an approximation (within distance e) of the state in Eq. ([6]) 
using poly (k, log m) gates. Note that, to accomplish this, we must avoid any approach based on 
first constructing the expanded state in Eq. ([5]) then applying C^, since this would immediately 
entail order m gates. Our efficient approach is to first prepare a state similar to Eq. (J5]) using a 
slightly different encoding scheme than C^, denoted B k . We then postprocess the state so that 
the encoding is changed from B k to [i.e. Eq. (jfil)]. 

We now introduce the encoding B k by explicit construction. Specifically, it is based on the 
exponential superposition state 

g-i 



^/3a*| S )+a%>. (7) 
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M( 7 ):=— L=( !, T )• (*) 



The state is very simple to prepare when </ = 2 r , as follows. Define the unitary matrix 

M!m\ ■= , 

yr+7 2 v 7 i 

Note that 

M(a 2r ~ X ) <g> • • • ® M(a 2 ) ® M(a)|O r ) 

= 2 (|0 . . . 00) + a|0 . . . 01) + a 2 |0 . . . 10) + • • • + a 9-1 |l . . . 11)) 

V 1 — a 2< ? 

VI - a 2 ^ ^ 

Therefore, a circuit that maps |0 r+1 ) to \<j> q ) can be obtained by first applying a one-qubit gate on 
the first qubit to put it in state y/i — a 2 i\0) + a q \l), and then applying a sequence of controlled- 
M{a 23 ) gates (each controlled by the first qubit being in state |0)) to create the state 

P\0) (|0...00) + q|0. . .01) +a 2 |0... 10) + ••• + afl~ 1 \l. . .11)) + a g |l)|0 • • • 00) 

= (|00 . . . 00) + a|00 . . . 01) + a 2 |00 . . . 10) + • • • + a« -1 |01 . . . 11)) + a 9 |10 • • • 00) 

= \4> q ). (io) 

The reason why state \<j> q ) is useful is because, for q > m, \4>q) +i yields a state similar to 
Eq. ((£)). The encoding, which we will call B q \x), is slightly different than C^Jx), but can be 
efficiently translated into \x) with some "clean-up" operations. Specifically, the encoding is as 
defined below. 

Definition 4.2. Define the encoding scheme B q on \x) for x £ {0, l} m , \x\ < k as follows. For 
x = Sl 10 S2 10 S3 . . .0 S H0* ; where h := \x\, h < k and t = m - si s h -h, 

B k q \x) = \ Sl ,...,s h ) ^U + t) + a9-*|?)j \4> q r k ~\ (11) 

where B*\x) € (C^ +1 ) m+1 . 

The state is then given as in the following theorem. 
Theorem 4.3. For q > m, the states B q \x) for \x\ < k are orthonormal and 

u\®*+i_ ^ ar-Wp\*\B*\x) + it\i/), (12) 



a;G{0,l} r ' 

\x I < k 



for some \v') orthogonal to all B^\x) for \x\ < k. 

Proof. The state |^> ? )® fe+1 is a superposition of (computational) basis states of the form \s\, . . . , Sf.+i)- 
where s±, . . . , Sk+i £ {0, 1, ... , q}. Intuitively, it is useful to think of each such basis state as an 
encoding of a binary string Sl 10 S2 1 • • • Sfe+1 1 (whose Hamming weight is k + 1 and length is 
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si + • • • + Sfc+i + k + 1). We will show that these basis states can be naturally partitioned into 
equivalence classes: one for each prefix x G {0, l} m with \x\ < k, and one for all the remaining 
basis states. 

Let x G {0, l} m with h = \x\ < k be of the form x = Sl 10 S2 10 S3 . . . Sfe 10*. Consider the set 
P x that consists of all \s'i, s' 2 , . . . , s' k+1 ) that are encodings of strings whose m-bit prefix is x. The 
set P x consists of all |s^, s' 2 , ■ ■ ■ , s' k+1 ) such that s' 2 , ■ ■ ■ , s' h ) = (91,82,..., Sh), s 'h+i 

G {t,...,q}, 

and s' h , 2 , . . . , G {0, . . . , q}. It follows that the sum of all the terms in the superposition 



EE- E « s ' 1+s ' 1+ - +s '^/3 |W ^ <9}| l4,4,---,4 + i) (is) 

4=0 4=0 s' fc+1 =0 



that correspond to elements of is 



a sl /3---c^/3|si,..., S/l > I ]T a j (3\j) + a 9 ^) 



'q-t-l 



a si p ■ ■ ■ a Sh pa l \si, ...,s h )\ a j p\j + t) + a^q) 

V 3=0 

v m-\x\ o\x\ r>k 



bq) ®k-h 



= a m -Wp w B*\x), (14) 

which is the appropriate weighting for B*\x) in the sum in Eq. (|12|) . 

Thus, the basis states in the superposition in Eq. (fl~3|) corresponding to encodings of strings 
x G {0, l} m of Hamming weight at most k can be grouped into equivalence classes P x . What about 
the remaining terms in \(f> q )® k+1 which do not fall in any P x ? These are the \s\, . . . where 
s\ + • • • + Sfc+i + k + 1 < m. Therefore, we can set 

H\i/)= Yl ^ + -+ 8 ^p k+1 \ Sl ,...,s k+1 ), (15) 

«lH hs fc+ i+fe+l<m 

where \x G M is chosen so that \v') is normalised. All the B k \x) and \v') are mutually orthogonal 
since they are constructed from a partition of the basis states. □ 



4.4 Converting from the B Encoding to the C Encoding 

We have thus far shown how to prepare states in the encoding B^. As mentioned above, we can 
now convert from the encoding B^ to our desired encoding C^. This is achieved by "cleaning up" 
the registers that follow register h = \x\ in B k \x) [compare Eq. Q with Eq. pip ]. The difference 
is that, instead of these registers being in the state \m), they are in the state \(f) q ) (for registers 
h + 2 to k + 1). Register h + 1 is in a state that is similar to \<p g -t), except that the basis states 
are shifted by t. Therefore, we need a way of converting these registers to the state \m). However, 
this conversion depends on both h and t, so we first need these quantities. 

We will first give a simplified explanation, then expand on the technical details. To determine 
h and t, we compute the prefix sums 

\S1)\S 2 ) ■ ■ ■ \s k+l ) H"> |S1 + 1)|S1 + S 2 + 2) • • • |S1 + S 2 + ■ ■ ■ + Sfc+1 + k + 1). (16) 
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This gives the absolute positions of the ones. The value of h can be determined by finding the 
first register with a value larger than m (which would give a position for a one past the end of the 
string). 

Now we can identify register h + 1. For this register, we wish to subtract t, so that the state 
of this register [as in Eq. (fTTj) ] becomes \4> q -t). At this stage we have computed the prefix sums, 
and subtracting m + 1 from this modified register gives the same result as subtracting t from the 
unmodified register. That is, we do not need to explicitly compute t to subtract it, because it is 
obtained implicitly in the prefix sum. For all the other registers we then undo the prefix sums. 

At this stage we have h in an ancilla, and we have subtracted t from register h + 1. Now we 
can undo the procedure to prepare \4> q ) in registers h + 1 to k + 1. Register h + 1 is actually in 
state \4> q -t) rather than \4> q ), but it is a good approximation of state \4>q). Therefore the inverse 
preparation yields states |0) in registers h + 1 to k + 1, with this being approximate for register 
h + 1. It is trivial to convert |0) to \m), then uncompute the value of h in the ancilla register. This 
then completes the conversion of the encoding. 

In summary the overall procedure is as follows. 

1. Compute the prefix sums. 

2. Compute h = \x\ in an ancilla register. 

3. Uncompute the prefix sums for registers other than h+1, and subtract m + 1 from register 
h + 1. 

4. Invert the procedure to prepare \4> q ) from |0) on registers h + 1 to k + 1, and swap register 
h + 1 with the error flag register. 

5. Flip one qubit on registers h + 1 to k + 1 to change |0) to \m). 

6. Uncompute h in the ancilla register. 

Next we explain the technical details, including the error flag register. When computing the 
prefix sums, we can first consider the case of low-Hamming weight strings with h < k. For the 
first h registers the result is at most m, whereas for register h + 1, the result is (coherently) more 
than m. To prevent the value in register h + 1 wrapping around modulo m, we instead expand the 
registers to dimension m + q + 2, and perform the computations modulo m + q + 2. Because the 
value in register h+1 is no more than that in h (which is < m) plus q + 1, the value is < m + q + 1, 
and does not wrap around modulo m + q + 2. The values in registers h + 2 to k + 1 may wrap 
around, but this does not affect the calculation. This covers steps 1 and 2 above. 

Next, considering step 3, the value in register h + 1 will be 

si + ... + s h + s h+1 + h + l = m-t + s h+1 + 1. (17) 

We aim to obtain s^+i — t in this register. If we had computed the value of t, we could uncompute 
the prefix sums, then subtract t. However, it is obvious from Eq. (fT7|) that we can just subtract 
m + 1 instead. Note that this is the first register that is larger than m, so subtracting m + 1 does 
not result in a negative number. We also need to uncompute the prefix sums for the other registers. 
This can be achieved by working backwards from register k + 1 to h + 2 uncomputing prefix sums, 
subtracting m + 1 from register h + 1, then uncomputing prefix sums from register h back to 1. 
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Next we consider the inverse preparation in step 4. At this stage, we have subtracted t from 
register h + 1 yielding the exponential state 

q-t-l 



By choosing q to be sufficiently large, \4> q -t) is close to \4> g ), and inverting the procedure for 
preparing \tp q ) yields an accurate approximation of |0 r+1 ). To be more precise, note that (4> q -t\4> q ) = 
l-(l-/3)a 2 («-*). Therefore, we have {(j) q -t\4>q) >l — e\iq>m + (1//3 2 ) log(l/e). To achieve this, 
\4> q ) need only consist of log(m + 1//3 2 ) + loglog(l/e) + O(l) qubits. In particular, in our context 
where /3 = Q(l/y/rn), the number of qubits is logm + loglog(l/e) + O(l), so the precision scales 
double exponentially with the number of additional qubits beyond log m. 

This approximate step could alternatively be performed using the state preparation procedure 
of Grover and Rudolph [7J. Another alternative is to use amplitude amplification to ensure that 
the register is set to zero correctly. These alternatives would also not be exact, because they would 
require the coherent calculation of trigonometric functions. 

It is convenient for the analysis to swap register h + 1 with an "error flag" register that has been 
prepared in the |0) state. Then, if this register is measured as not zero, it flags that the clean-up 
operation has not occurred properly. On the other hand, register h + 1 is exactly |0). 

We also need to take account of the action of the conversion procedure on the state |z/). This 
state is a superposition of basis states |sj , . . . , Sfc+i), where si + . . . + Sk+i + k + 1 < m. This means 
that, when we compute the prefix sums, the last register will not be > m. In this case, we can set 
h = k + 1, and then make no changes to the other registers in steps 3 to 5 for this value of h. This 
means that is unchanged. The exact form of this state is unimportant, because it corresponds 
to an error. However, \v') is a superposition of strings of Hamming weight h + 1 encoded using C^, 
and remains so under the conversion. 

In summary, the overall preparation procedure is to prepare the state \(f)q)® k+1 , then perform 
the clean-up procedure consisting of steps 1 to 6 above. By choosing logg G 0(logm + loglog(l/e)) 
(for q a power of two), this then yields the state © within distance O(e). Our circuit has size 



The final state has no values in its registers larger than m, so it can be stored in registers of 
dimension m + 1, though higher dimensions are required in intermediate steps. 

To prepare the state, we have started with all qubits of registers in the state |0). It is convenient 
to start with these registers in the state \m), flip one qubit in each register to give |0), then perform 
the preparation procedure as described above. Then we are mapping the state C^|0 m ) (which is 
the state |m)' x,fc+1 ) to the succinct representation of (a|0) + /3I1))®" 1 as defined in Eq. ([6]). 

4.5 Phase Gates, Queries and Driving Operations 

Applying the phase gates, p® m ) to the control qubits in their succinct representation is straightfor- 
ward because P® m \x) = i^\x). We need only compute \x\ in an ancilla register, apply \s) \-t i s \s), 
and then uncompute |x| in the ancilla. 

To apply the driving operations, we note that our definition of driving Hamiltonian implemen- 
tation fits perfectly in this context, once we compute the prefix sums to give the positions of the 




(18) 



O (k [logm + log log(l/e)]) . 



(19) 
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ones, as in Eq. (|16|) . In the compressed representation, V\ is the implementation of the driving 
Hamiltonian with t s hardwired to and tf controlled by the first register. V2 is the implementation 
with t s controlled by the first register and tf controlled by the second register, and so on. At the 
end, the prefix sums can be uncomputed. 

4.6 The Value of m Needed 

In the CGMSY construction the number of fractional queries m comes from breaking up the evo- 
lution under the oracle and the driving Hamiltonian via a product formula. To obtain error 
bound by etot with evolution over time T and driving Hamiltonian with norm the num- 

ber of time intervals needed in a Trotter-Suzuki product formula for constant Hamiltonian H is 
0(||i7||T(||i7||T/e t ot) 5 ) [SJ- For the CGMSY construction the intervals need to be of equal size, 
which restricts 5 to 1/2. 

For time-dependent Hamiltonians, the complexity of Trotter-Suzuki product formulae will de- 
pend on the magnitude of the derivatives of H when one is sampling the Hamiltonians at different 
times [8]. The situation we have here is somewhat different, because we assume that the evolution 
under the time-dependent driving Hamiltonian can be implemented. In this case, the error does 
not depend on the time derivative, and the error for a short time interval 5t can be bounded as 
||ff||<5i 2 (this is easily derived from Eq. (2.3) of Ref. [9]). Hence the number of intervals to limit the 
overall error to O(etot) need be no greater than 0(\\H\\T 2 / e^t) ■ The number of intervals in one 
CGMSY segment of length O(l) is therefore m = 0(\\H\\T/e tot ). 

Another question is the precision that the time needs to be specified to in order to limit the 
overall error to etot- It is easily shown that the error in the time needs to be bound as 0(e'/||ff ||) 
in order to limit the error in a single operation to e' . If the time is being specified on the interval 
[0, T], then the number of bits needed for the time is [~log(||ff ||T/e')] . Because there are O(l) 
controlled Hamiltonian evolutions in each CGMSY segment, we need e' = 0(et o t/T). This gives 
the number of bits for the time as log(||ff||T 2 /e t ot) + 0(1) (where the constant 0(1) is because e' 
may have a constant of proportionality with e tot /T). 

This result is consistent with the value of m used, because log(||i/||T 2 /etot) + 0(1) bits are 
needed to specify an integer from to 0(mT). In the CGMSY construction, a superposition over 
the m time intervals is used, so the number of qubits needed is [log m] . The number of the CGMSY 
segment also needs to be stored, but that can be stored in O(logT) classical bits. 

One can use the number of bits for the time to place a lower bound on the complexity of 
implementing the driving Hamiltonian. To obtain overall accuracy O(etot)) the driving Hamiltonian 
needs accuracy of 0{e\ iO t/\\H\\T) in the time. There are 0(||-ff ||T 2 /e tot ) starting and finishing times, 
so by a counting argument the gate complexity is f2(log(||/f||T/etot))- If the driving Hamiltonian 
is constant, then it is only the length of the time which is important, and that is limited to 
O(l). The number of times is then G(||ff||T/e tot ), but the lower bound on the complexity is still 
n(log(||iJ||r/etot))- For constant error we therefore have G = f2(log(||ff ||T)), as used in Section[3j 

5 Measurement of the Control Qubits 

What remains is to perform the final measurement. This should logically correspond to what 
happens if the state is decoded from its succinct representation to m qubits and then, for each 
qubit, an R gate is applied and it is measured in the computational basis. Of course, this cannot 
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be literally implemented this way, because it would increase the gate and space usage to at least 
m; our task is to logically perform this while remaining in the succinct representation. 

Recall now that in Section B~3l we constructed a procedure that approximately prepares _R® m |o m ) 
in succinct form [see Eq. (jfij)]. We define U m to be the ideal unitary that would exactly prepare the 
state ©. The action of the ideal state preparation procedure is then U m C m \O m ) « C m R® m \O m ). 
The procedure we have described does not exactly perform this unitary, but it is within distance 
O(e). Also, we do not have an exact equality, because representations of terms with Hamming 
weight greater than k in i?® m |0 m ) are not obtained with the correct weights. More precisely, we 
have 

U m Ct\O m )= E a m ~ W £ W ^|x)+ M |i/>. (20) 

x£{0,l} m 

\x\<k 

This is to be compared with the uncompressed setting [Eq. fl3J)], in which we have 

fl«*»|(p)= a m -WpW\x)+n\v). (21) 

\x\<k 

In terms of the logical data, U m and R® m produce almost the same state when applied to |0 m ). 

Returning to the issue of measurement, in the uncompressed basis we would like to perform 
R® m , then perform a computational basis measurement. In the particular case that the compu- 
tational basis measurement yielded all zeros, the measurement operator is |0 m )(0 m |i?® m . Because 
we are performing all operations in the compressed basis, this measurement operator can be rep- 
resented by C m \0 m }{0 rn \R® m (C m ) J( . Because R is self-inverse, this is approximately the same as 
C^|0 m )(0 m |(C^J^t/m- That is, to achieve this measurement result we first invert the preparation 
procedure described by U m . Then, because C m \0 m ) = \m)® k+l is a computational basis state, we 
can achieve the desired result by performing a computational basis measurement. 

Ideally, this is what we want, but we also need to be able to find the positions of the ones in the 
case that the all-zero string is not obtained. At first glance, one might imagine that applying U m 
in place of R®' m would yield a succinct representation of the final outcome state, so measuring in 
the computational basis would provide the correct result. Unfortunately, this does not accurately 
simulate the final measurement except in the case where the all-zero string is obtained. The problem 
is that U m and R® m are only in close agreement when applied to the logical state |0 m ). For any 
other logical state \x) (for non-zero x £ {0, l} m ), applying U m and R® m need not yield states in 
any close agreement. 

Our first observation towards overcoming this problem is that we can at least perform an 
incomplete measurement that captures a seemingly small part of what we are seeking: we can cause 
the state to either collapse to logical |0 m ) or to the subspace that is the orthogonal complement 
of this state — and with the correct probabilities. This is achieved by performing U m and then the 
2-outcome incomplete projective measurement that distinguishes between the logical state |0 m ) 
and its orthogonal complement [O" 1 ) -1 ", and then applying U rn to the resulting collapsed state. Our 
method to complete the measurement is to apply the above procedure recursively, on the two halves 
of the logical string. We now first motivate this procedure intuitively, followed by further technical 
details and a rigorous proof of correctness. 
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5.1 Measuring in Succinct Form: Intuition 

The intuition behind our measurement strategy is given by the following simple thought experiment. 
Consider the problem of measuring an m-qubit state in the computational basis. This can be 
accomplished by performing a sequence of two-outcome measurements in a variety of ways. One 
obvious approach is to measure the state of the first qubit, then the second qubit, and so on. Each 
final outcome x G {0, l} m will occur with exactly the same probability as with the original complete 
measurement. We now describe an alternative — and unconventional — approach for simulating the 
same measurement. 

First, perform the measurement distinguishing between |0 m ) and \0 m } ± , its orthogonal comple- 
ment. If the state collapses to |0 m ) we halt, outputting m . Otherwise (when the state collapses 
to jO" 1 )- 1 -), apply the measurement |0 m/ ' 2 ) vs. ^T 1 / 2 ) 1 - to the first m/2 qubits. If that part of the 
state collapses to |0 m//2 ) then output m / 2 for the first m/2 bits; otherwise recurse further. Once 
this recursive measurement procedure for the first m/2 qubits has terminated, repeat it for the 
second m/2 qubits. Each final outcome x G {0, l} m occurs with exactly the same probability as 
with the original complete measurement. Note that although this process may appear complicated, 
it terminates fast whenever the Hamming weight of the final outcome x is small: for Hamming 
weight up to k', at most fc'logm steps are performed. 

Our actual scenario is different than the one described above in that the final measurement is 
in the basis {R® m \x) : m G {0, l} m } rather than the computational basis. However, our logical U m 
and U m permit us to approximate the i?® m |0 m ) vs. R® m \0 m ) ± measurement well. Also, making 
use of the fact that the underlying operation that we are simulating has a tensor product struc- 
ture, R® m \ Xl x 2 ) = R® m / 2 \xi)R® m l 2 \x 2 ) for any x±,x 2 G {0, l} m ' 2 , we can emulate the recursive 
procedure in the above thought experiment. We now make this rigorous. 



5.2 Measuring in Succinct Form: Details 

We now introduce Alg. 15.21 which formalises the intuition behind the recursive measurement out- 
lined above, and show that it simulates the desired measurement in succinct form. Recall that we 
assume without loss of generality that m is a power of 2. 

Before stating Alg. 15.21 we require a lemma which allows us to efficiently "split" the encoded 
version of string x = x\x 2 into the concatenation of the encoded versions of x\ and x 2 . 

Lemma 5.1. Let x = x\x 2 for x G {0, l} m with \x\ < k, x\,x 2 G {0, l} m / 2 and m a power of 2. 
Then there exists a quantum circuit with complexity 0{k\ogm) for achieving the mapping 

C m \x lX2 ) i y C m/2 \ Xl ) C m/2 \x 2 ), (22) 
where C k m \x x x 2 ), C* /2 |xi), C k m/2 \x 2 ) G (<C™+ 1 )® fe + 1 . 

Proof. Because both C m \x\x 2 ) and C^^ 2 \xi) (g) C^^xi) are computational basis states, the pro- 
cedure that is performed is the same as would be performed classically, except that it must be 
performed coherently. That is, there is a reversible classical procedure to split the encoding in the 
computational basis, which immediately provides a coherent procedure for splitting the encoding. 
Because there are 0{k) registers of size 0(log m), the complexity of this procedure is 0(k log m). □ 

The formal statement of the recursive measurement algorithm is given in Alg. 15.21 To perform 
our recursive measurement, we simply call MEASURE(A, l,m), where A is the register containing 
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Algorithm 5.2. S = MEASURE( A , mi , m 2 ). 



• Input: A - Registers corresponding to space (C m + 1 )® fe + 1 containing the subset {mi, . . . , 1712} 

of the encoded control qubits. 
mi - The starting index mi 6 [m] of the encoded qubits in A. 
m-2 - The ending index m-2 6 [m] of the encoded qubits in A. 

• Precondition: m,2 — mi + 1 is a power of two. 

• Output: A set of indices S C [m] containing the positions where an uncompressed measure- 
ment would have found ones in the uncompressed setting. 

Perform a measurement described by the measurement operators M^ 2_mi+1 and M™^~ mi+1 , where 

M™ := U n C%\0 n )(0 n \{C^ul and M£ x := I — M c n . Label the measurement result d. Then 

1. (Zero detected) If d = 0: Return 5 = 0. 

2. (Base case) If d = 1 and mi = m.2: Return S = {mi}. 

3. (Recurse) If d = 1 and m-2 > mi: Split A to ^4i and ^2; containing the encoded forms of the 
first and second halves, respectively, of the control qubits. Then return 

5 = MEASURER, mi, (mi + m 2 - l)/2) U MEASURE(A 2 , (mi + m 2 + l)/2, m 2 ). (23) 



our compressed control qubits. Once the procedure finishes running, it will return the locations of 
all the ones an uncompressed measurement would have obtained when measuring the uncompressed 
version of A. We truncate the recursive measurement procedure if k! ones have been located, to 
limit the complexity of the procedure. 

We now introduce a notation that will be used throughout the remainder of the paper in order to 
simplify reference to quantities in the uncompressed protocol versus the compressed protocol. For 
quantities (states, operators or probabilities) in the compressed protocol, we will use a superscript 
or subscript "c", whereas we will use "u" for the uncompressed protocol. To refer to quantities 
defined for both, we will use "77". We also use n to refer to operations acting on a compressed 
sub-portion of the string of length n (instead of m for the full string). 

To perform the measurement described by the measurement operators M™ d in Alg. 15.21 we 

apply Un, perform the measurement that distinguishes the encoded all-zero state from all other 
states, then apply U n . In this form it is clear why we need to perform the operation U n after the 
measurement: it means that all states orthogonal to that corresponding to measurement result 
are unchanged, because they are just acted upon by the identity. The final U n operation is also 
included for the measurement result for simplicity, but it is not needed. As these measurement 
operators are projections, they are the same as the positive operator-valued measure elements. 

For simplicity we have described the measurement in terms of U n and U n , but in reality we will 
use operations on an expanded space that includes an error-flag ancilla. Recall that, because \<t>q-t) 
is not exactly equal to \4>q), we have a register that is not exactly reset to zero, and this is swapped 
into an ancilla register. The unitary operations in this expanded space will be denoted U n and Un- 
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Then the action of U n is 

U n C*\0) ® |0) = e x [VT^ x C k n \x)®\Q) + ^r x \eii x )®\l)\. (24) 

i6{0,l} n 
a:|<fc+l 

Here the tensor product with |0) on the left-hand side indicates the use of ancillas that are initially 
in the state zero. The amplitudes £™ are the amplitudes for each C k \x) in the state ([6]) (when 
m is replaced with n). These amplitudes include those for \x\ = k + 1 for the state which 
corresponds to encoded Hamming-weig ht k + 1 states. For \x\ < k, we have Q = a n ~^/3^. The 
tensor product with |0) on the right-hand side indicates ancillas that will be set to zero in the case 
of success. The parameter e x is < e, and can in general depend on x. The state (err^) is an error 
state. 

For the ideal state preparation, we have 

(x\(C k ^U n C k \0)=C- (25) 

Using the expression for the action of U n , we find 

[((sKCftt) ® (0\}[U n C k n \0) ® |0>] = e x VT^T x = (x\(C k JU n C k \0)[l - 0(e)]. (26) 

There is no contribution from the error register, because the error flag is orthogonal to zero for 
that register. 

To perform the measurement, we append ancillas in the zero state, and perform U n . Then we 
perform the measurement that projects onto C k \0 n ) <%> |0) and its orthogonal complement. Here 
the tensor product with |0) indicates the extra ancillas used by the full preparation procedure U n . 
Then we perform U n . 

The action of this measurement will have error 0(e) from that used in the algorithm. First, 
consider the resulting state for zero measurement result and initial state C k \x). 

U n [C k \0 n )(0 n \(C k )^ ® |0)(0|]C/t^|x)|0) = U n C k \O n ) ® |0>[<*|<0|(<7*)ttf n C*|0">|0)r 

= U n C k \O n ) ® \0)[(x\(C k )^U n C k \O n )ni -0(e)] 

= U n C k \O n ) ® \0)[(O n \(C k )^UlC k \x)][l - 0(e)]. (27) 

Therefore we find that the probability of this result is changed by no more than 0(e). In addition, 
tracing over the ancillas used, U n C k \O n ) ® |0) is an approximation of U n C k \O n ) with trace distance 
0(e). Therefore, for the zero measurement result, the resulting state has trace distance no more 
than 0(e) from that for the ideal measurement using U n . 
The resulting state for measurement result 1 is then 

C k \x)\0) - U n C k \O n ) ® |0)[(0|(^)t[/tC7 n fe |x)][l - 0(e)]. (28) 

This is because U n is exactly the inverse of U n in the expanded space. Trivially from the result for 
the zero measurement result, once we trace over the ancilla the resulting state has trace distance no 
more than 0(e) from that for the ideal measurement. As a result, even though we can not perform 
U n exactly, we can approximate the measurements within error e using the U n and U n operations. 

To show that the algorithm correctly simulates the desired uncompressed measurement, we 
consider a similar recursive measurement on the uncompressed state. We show that, except for 
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the imprecision due to approximating U n and omitting high Hamming weight components, the low 
Hamming weight portions of the states in Eqs. (|20p and (|2ip evolve identically. Moreover, this 
holds even if the control qubits are entangled with a target register, as is generally the case here. 

In the uncompressed setting, the state of the control and target registers before the final mea- 
surement can be described as approximately 

l^u>:= £ l%\x)\w x ), (29) 
xe{o,i} m 

\x\<k 

where \w x ) describes the state of the target register where the queries Q are applied. Note that 
l^u) is unnormalised, as we have omitted the high Hamming weight component. Similarly, in the 
compressed setting, before the final measurement we approximately have the (unnormalised) state 

fo:= E T^m^K), (3°) 

\x | < k 

where the states \w x ) coincide with those in the uncompressed case. The coefficients 7^ are the 
same in each case, and are equal to v x *£™- We use this notation for consistency with the coefficients 
for the intermediate states in Eqs. (|32H and (|33p below. 

We consider a measurement in the uncompressed case that is the same as in Alg. 15.21 We show 
that the results obtained in the two cases are close, but there are two sources of error: (1) the 
error incurred due to the high Hamming weight component of the state, and (2) the error due 
to not implementing U n exactly. First we discuss the error-free case, i.e. where (1) we omit the 
high Hamming weight component, and where (2) U n is implemented exactly. We subsequently 
reintroduce both sources of error and analyse their impacts. In the error-free analysis, we show the 
following. 

Theorem 5.3 (Error-free simulation). Assume we are in the error-free setting defined above. Then, 
suppose that before the final measurement, the states of the uncompressed and compressed control 
and target qubits are given by Eqs. |22j) and (fffflj) . respectively. Then, Alg. \5.2\ exactly simulates the 
uncompressed R® m measurement in the following sense: 

1. After running Alg. \5.2\ the probability of obtaining a given measurement result is the same 
as for the uncompressed R® m measurement, and 

2. for a given measurement result the state of the target register in both uncompressed and 
compressed settings matches. 

Proof. Measuring i?® m |?/>) in the computational basis can also be simulated using a recursive ap- 
proach; namely, we apply R® m , followed by the incomplete measurement of |0 m ) versus its orthog- 
onal complement, then apply R® m . This can be represented by the measurement operators M™ d , 
with 

M" := R® n \O n )(Q n \R® n , (31) 

and M™ 1 := I — M™ . Similar to Alg. 15.21 we are including the application of R® m for both 
measurement results for simplicity, though it is not needed for result 0. If we obtain 1 as the 
outcome, we recurse on the two blocks of m/2 qubits by applying the measurement with operators 
M™^ 2 , and so forth. 
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To prove the result, we simply need to show that at each step in the recursion the states resulting 
from measurement operators M™ d and M™ d are equivalent. Let us denote the measurement result 
obtained at each step in the recursive measurement scheme by dj. Then, at step £, we have 
measurement results di, . . . , di-i, and will have a state that depends on those measurement results. 
Let us assume that at this step we have equivalent states for the compressed and uncompressed 
cases. The base case is that for £ = 1, where the initial states (|29ft and (|30p are equivalent. Then 
the states for the two cases can be expressed as 

|^ r *-i)) = £ ^- d ^\c k n CLt)\x)\w x ), (32) 
|cc|<fc 

\^}-i dl ~ x) ) = E ^r^i^K)- (33) 

|cc|<fc 

At this stage the encoding will be a succinct encoding on a subset of n of the digits of x, and another 
encoding of the remaining digits, the exact form of which is unimportant for this analysis. The 
subset of n of the digits of x will depend on di, . . . , c^_i. This dependence has not been indicated 
here for brevity. We also omit x S {0, l} m from the sum for brevity. 

In order for the results obtained for the compressed and uncompressed cases to be equivalent, 
all that is required is that the amplitude weightings Jxi!l'i' dt ~ 1 ^ in Eqs. (J32I) and (i33j) are the same. 
The results are equivalent in the sense that the probability of the measurement results, as well as 
the state of the target system for a given measurement result, are the same. The probability of the 
measurement results will be obtained from the normalisation of the state, which must be the same 
if the amplitudes are the same. Similarly the resulting state in the target system will be the same 
if the amplitudes are the same. 

We will adopt the notation that / res t indicates the identity on the remaining registers, so the 
overall measurement operator is M™ d (g) I rcst . We will also adopt the notation that x n is the subset 
of n digits of the string x, and x res t is the remaining digits. Then we have 

<0"|i^"K) = (x n \R® n \0 n ) = c/H^W = (x n \(C k )^U n C k \0 n ) = (0 n \(C k JulC k \x n ). (34) 

For the compressed case, consider performing the measurement with operators M" d . In the case 
that the measurement result is d = 0, our compressed state becomes 



(M« /resOIVfc^ ) « (U n C k \0 n W\(C k )W n I, 



'rest 



\x\<k 



= E 72m*-° ((0 n \(C k mC k \x n )) U n C k \O n )C k est \xrest)\w x ) 

\x\<k 

\x\<k 

« E jI^I 41 ^ mR^n) ( £ CyC k \y) ] C k es t\*rest)\w x ) =: |^-" A - 1 '° > > ) (35) 

\x\<k \|y|<fc / 

where £™ = a n -^P^, and y is an n-digit string. The approximate equality in the first line 
of Eq. (f35j) is because the measurement operator M™ cannot be obtained exactly, because the 
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unitary U n is not performed exactly. The approximate equality in the last line is because the 
high Hamming weight components have been omitted. In the error-free setting the error in these 
approximations is ignored. 

In comparison, in the uncompressed setting, a similar calculation yields, for d = 0, 

(M« ® WlVfc'^) = (R® n \0 n )(0 n \R® n ® Irest)!^:^- } 

- £ TSrr^Vl^K) ( £ gy> ] ke S t)K> =: (36) 
\x\<k \\y\<k J 

The approximate equality in the last line is again due to omitting high Hamming weight compo- 
nents. In the error-free setting the error in this approximation is ignored. In the case that the 
measurement result is d = 1, then the states obtained are 

(i-m&v /rest) itf$-r*" l) > = \^:f l - x) ) - i$5 , "' A " 1,0) > =: i$5'"" , *" 1,1) >' 

(I - M u " )0 / rcst ) l^i^ ) = l^i"*- ) " l^}'""^) =■ l^5'" ,, *- 1,1) >- ( 3 7) 

Above we have defined resulting states after the measurements in the uncompressed and com- 
pressed setting of \ip^e'"' ) and ), respectively. The quantity ty^t'') is the state 

in the compressed case before the change in the compression. To obtain the state \ipcl' "' }, the 
compression of the string must be changed as per Lemma 15. 11 This can be done without error, and 
does not change the amplitudes. 

Omitting the high Hamming weight states, we start with states \ip<n), which have the same 
amplitudes in the compressed and uncompressed cases. Then, by the above reasoning, if the 
amplitudes are the same at step I — 1, they are the same at step £. Therefore, by induction, the 
amplitudes must be the same after the full recursive measurement. Therefore the same amplitudes 
are obtained for the compressed and uncompressed cases, so the results obtained in the compressed 
and uncompressed cases are equivalent. That is, the probabilities of the measurement results and 
the state of the target register for a given measurement result match. □ 

Theorem 15.31 shows that if we focus solely on the low Hamming weight subspace, and if we 
assume we can prepare the state C^|0 n ) exactly, then our succinct recursive measurement Alg. 15.21 
perfectly simulates the uncompressed measurement. We now analyse the error incurred when these 
two assumptions are dropped. First we need to identify the appropriate measure of the error in the 
measurement. We would like to bound the average trace distance; i.e. 

^=EPbllPb-Pblltr, (38) 
b 

where p^ is the probability of obtaining the measurement result b = (pi, ... , b m ), and is the state 
for the target system. We would also like to bound the error in the probabilities obtained. This 
is because measurement results with many ones will be difficult to correct, so we need to ensure 
that the probabilities for those measurement results remain small. The error in the probability 
distribution can be quantified by 

Ap:=J2\pl-pl\. (39) 

b 
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We can bound both those errors using the quantity 

D av :=^2\\plpl-P c h p c h \\tr. (40) 
b 

Because the trace distance is non-increasing under channels, and we obtain Ap by applying the 
completely depolarising channel to both and p^ in Eq. (I40p . we have Ap < D av . Then we have 

PbllPb " Pblltr < IbbPb -PbPblltr + WpIpI - PbPblltr < 2\\plpl - p c h p c h \\ tT . (41) 
Summing over b then gives D < D av . 

Theorem 5.4 (Error bounds). The error between compressed and uncompressed schemes can be 
bounded as 

A» = 0{e' + ek'\ogm). (42) 

Proof. In order to bound the value of D av , we have two main sources of error. First is that in 
preparing the initial state, where the high Hamming weight terms are omitted, and second is the 
sequence of approximations in the measurement operators in Eqs. (|35p and (|36p . The approach to 
bounding the error is as follows. In locating the position of a single one in the measurement result, 
there is a contribution of 0{e) to the error from each of the steps as described in Eq. ([35]). These 
need to be performed logm times, and as a result the contribution to the error is O(elogm). If h 
ones need to be located, the worst case is where the sequence of measurements to locate these ones 
is independent, so the contribution to the error is 0{he\ogm). Since the error due to locating no 
more than k! ones will be 0(e'), we can take h < k' , and bound the overall error by 0{ek! logm+e'). 

To make this analysis rigorous, we first want to omit the high Hamming weight measurement 
results. For the measurements in the uncompressed case, the probability of measurement results 
with Hamming weight over k! is 0{e'). This is because the probability of obtaining each one is no 
more than 2a 2 j3 2 . Because we take /3 2 ~ l/8m, the probability of obtaining more than k! ones with 
k! = 0(log(l/e')/ log log(l/e')) is 0{e'). Recall that we place a bound e' on errors that only occur 
once in each time step, and use a corresponding Hamming weight cutoff k! , whereas we use k for 
limiting errors that occur multiple times in the measurement process. 

To bound D av , we also need to take account of the probability of high Hamming weight mea- 
surement results for the uncompressed measurement. We can do this in the following way. First 
use _ 

E (** -pS) = £ (ps -p&) < E ipS -pi\ < E iipiiPb -pipUtr. (43) 

|b|>fe' |b|<fe' |b|<fc' |b|<fc' 

Therefore we can bound -D av by 

D av < Y / (pl+p u b )+ E WpIpI - pipill* 

\b\>k' |b|<fe' 

= E (p&-p£) + 2 EPb+ E WM-pipUtr 

\h\>k' \b\>k \b\<k' 

<0(e') + 2 E WpIpI-pIpI\W- (44) 
|b|<fe' 
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This means that omitting the high Hamming weight measurement results can only change the 
results by a multiplying factor and an O(e') term. For convenience we define 



Dl:= £ ||«-p c b /> c b ||t r . 
|b|<fe' 



(45) 



Next we note that the distance measure can be written as a trace distance between two states, 
rather than the average of trace distances. That is, 



DL 



£ (p b |b)(b|®p b -p b |b)(b|®p b ) 
|b|<fe' 



(46) 



tr 



The reason for this is that the complete matrix is block-diagonal, with P\ 3 P\ 3 —p\ ) p\ 3 as the blocks on 
the diagonal. The trace distance for the entire density matrix is just the sum of the trace distances 
for the blocks on the diagonal, which is the definition of D' av . 

Let us denote by \ip v ) the states obtained after preparation and controlled operations. Then we 
have 

Vlpl = TWM„, b |^)(^K b ). (47) 
Here Tr ctr i indicates a trace over the control registers. Then we have 



DL 



]T [|b)(b| Tr ctrl (M U)b |^u)(^u|Mt b ) - |b)(b| ® Tr ctrl (m^J^M^ 

\b\<k' 

Now note that the maps defined by 

Sr,(p) := £ |b)(b| IV ctrl (M^ bP M^ b ), 



• (48) 



tr 



(49) 



are completely-positive trace-preserving (CPTP). This means that trace distance will not increase 
under these maps. Now describing the states with the high Hamming weight components removed 
by IV^)) w e have 



J2 [|b)(b| ® Tr ctrl (M v ^){^\Ml^ - |b)(b| ® Tr ctrl (m^V^^M*, 
\b\<k' 

<ha{$ v )$ v \)-£*{\^){^\) < =0(e). 



tr 



tr 



(50) 



As a result, using the triangle inequality gives 



DL < 0(e)+ 



[|b)(b| Tr ctrl (M Uib |^u>^u|M^ b ) - |b)(b| Tr ctrl (m^J^M^ 

\b\<k' 



tr 

(51) 

Next, each measurement operator M^ b can be obtained by a sequence of measurement operators 

in our recursive measurement scheme, which will yield a sequence of measurement results d±, d,2, 

Each b will correspond to a unique sequence of d% measurement results. (Recall that bj are the 
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individual results of measurements on uncompressed qubits, whereas di are the individual results 
from the recursive measurement.) Therefore we can relabel the basis states such that we have 



DL 



^{|d)(d|®Tr ctrl M; d |V; u )^up< d )t -|d)(d|(8)Tr. 



[ ctrl 



M; d i^)(^ciK d )t]} 



tr 

(52) 

Now the measurement operators that are chosen at step t in the recursive measurement scheme 
will depend on the measurement results that have been obtained at steps 1 to I — 1 . Therefore we 
can write the measurement operators as 



K 



(di,...,d/_i) 



(53) 



Here K is the number of measurement operators to locate the ones. For measurement result b, the 
number of measurements required is no more than 1 + 2|b| logm. As we are taking b such that 
|b| < k' , we can take K = 1 + 2k' logm. 

Using this notation, we can define CPTP maps by 



(54) 



where 



^,(di,...,d^_x) 
ri,d e , proj 



(8) ••• g) <8> M 



(di,...,d«_i) 



(55) 



Each map simply performs the appropriate measurement based on the prior measurement results 
(which are stored in ancillas), and appends an ancilla depending on the result of the measurement. 
In term of these maps, the trace distance we wish to bound may be written as 



D' av = HTCctri^x • ■•£u,l(|^u)(V'u|) 



Tr ct rl<?c,X • • ■ £ c ,l(\lpc) (lpc\)\\ 



tr 



(56) 



As has been noted above, we can omit the high Hamming weight contributions to the states 
\ipn), with a possible change in the trace distance of 0(e). The reason for this is that the trace 
distance is non-increasing under CPTP maps. Our goal is now to successively approximate each 
of the maps in the sequence, at each stage bounding the introduced error by 0{e). At the end we 
will obtain two identical states, and then bound D' av by O(Ke). 

More specifically, we want to approximate the evolution of the states for given measurement 
results as in Eqs. (|35p and (|36p . Note that the reasoning given in the proof of Theorem l5.3l also gives 
a recursive method to determine the amplitudes in the states ' starting from 7^ = 

This means that the definitions of these states are unambiguous. We now consider the approximate 
unnormahsed states after i — 1 measurements in the recursive measurement scheme IVv^i'^ -1 )> 
as given by Eqs. (f32|) and (i33l) . We then define the states including the ancilla qubits containing 
the measurement results as 



Pv,> 



d\,...,di_ 



\d£-i)(d(-i \ (8) ... (8) <8> 



(di,...,d^_i) 

T],£-l 



Oi 



(57) 



We wish to bound the error in approximating £ rij i(p Vt £^i) by p Vt g. 
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In approximating £u,i(Pu,£-i) by Put there is only one approximation: that of omitting the high 
Hamming weight states in applying the rotation. The error in this approximation will be 0(e) times 
the norm of the state. Because the norm of the state is only changed by omitting high Hamming 
weight components, it can only be decreased. Therefore the error is 0(e). Similarly, there is error 
in approximating £ c e(Pc£-l) by p c e due to omitting high Hamming weight components, which is 
bounded by 0{e). There is also error because the U n rotations are not performed exactly. Two 
such rotations are performed, each with error bounded by 0(e), resulting in the overall error being 
bounded by 0(e). 

Therefore, we can start with Eq. (|52p . remove the high Hamming weight components from the 
initial states, then proceed taking I = 1 to K, replacing E^^p^^-x) by p v / at each step. At each 
step the distance is increased by O(e), and there are K steps, so we obtain 

D' av < 0{Ke) + ||1WPu,x) " Tr ctrl (p c ^)|| tr • (58) 

But, because the same amplitudes have been obtained for the compressed and uncompressed cases, 
the same state is obtained after tracing over the control registers, and Tr ctr i(/9 U! j^) = T^ c u\(Pc,k)- 
Therefore we obtain 

D' av = 0{Ke) = 0{ek' log m). (59) 
As D av = 0(e' + D'^), this yields Eq. ((42]), as required. □ 
To summarise the sources of error in the above proof, these are as follows. 

1. Omitting measurement results with Hamming weight greater than k'; see Eq. (I44|) . 

2. Omitting the high Hamming weight components of the initial states; see Eq. (|5ip , 

3. Omitting high Hamming weight components in each step of the recursive measurement. 

4. Inaccuracy in performing the U n operations in each step of the recursive measurement. 

Error sources 3 and 4 give a contribution to the error of 0(e) times the norm of the state for each 
step of the recursive measurement. However, for many initial sequences of measurement results, at 
step i all ones have already been located, so there are no further measurements needed. This means 
that the measurements at this point are just the identity, and no further error is introduced for that 
sequence of initial measurement results. This means that bounding the additional error by 0(e) 
for each £ overestimates the error. We will show that the error can be bound by using the mean 
number of ones that are measured. In the case of the uncompressed measurements, the probability 
of each one is < 4a 2 /3 2 . Because f3 2 ~ l/8m, the expected number of ones is < 4/3 2 m = O(l). 

Theorem 5.5 (Improved error bounds). Provided e = 0(1/ (k' log m)), the error between the 
compressed and uncompressed schemes can be bounded as 

D av = 0{e' + e\ogm). (60) 

Proof. More specifically, Prj,i-i will have a component where the recursive measurement scheme 
has not terminated yet, and another measurement needs to be performed. This component will be 
that where the ancillas contain d\, . . . , di>_\ corresponding to sequences of measurement results such 
that further measurements need to be performed. There will also be a component corresponding to 
sequences of measurement results where the recursive measurement scheme has finished. We will 
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denote the components corresponding to that where the recursive measurement has not terminated 
or has terminated by p™f__-y and respectively. More explicitly, if we denote by S con and <Sfi n 

the sets of measurement results (d±, . . . , c^_i) that correspond to a recursive measurement that has 
not terminated or has terminated, respectively, then we have 

/V-? n : = E K-iX^-il ® ■ ■ ■ ® \di)(di\ ® l^f^W^-f^l (61) 

(<ii,...,(if_i)e5 con / fin 



Because the measurement acts only on and the error in the measurement is bound by e 

times the trace of the state the measurement acts upon, the error in approximating £-q,t{pr),t-{) by 
p v i will be bounded by 0(eTr(p^_ 1 )). Therefore the total error from sources 3 and 4 is bounded 

°^E Tr K7-i))- (62) 

But, because the number of measurement steps need be no larger than 1 + hlogm, where h is the 
number of ones found by the measurement, the probability that the number of ones is > h is no 
greater than Tr(p^°£ logm ). Denoting the probability that the number of ones is > h by p(|b| > h), 
we have 

m mm m j m 

£p(|b| > h) = E^ b l = 3) = EEfd b l= 3) = EO' + iMIbl = j) = (|b|) + 1. (63) 

/i=0 h=0j=h j=0h=0 j=0 

Therefore we can bound the sum of the traces by 

K K 



?L(^-l)/logmJ logm) 

i=\ £=1 

rrt 

< logm^p(|b| >h) = ((|b|) + l)Iogm. (64) 

h=0 

Next we need to take into account the fact that here the expectation value of the number of 
ones is for the approximate states p™£_i, not for the exact uncompressed measurement scheme. To 
take account of this difference, we can use the cumulative error to bound the error in the norm 
of the state at each step. Note that the norm of p c °f_ l is the same for 7] = u and 7] = c, so we 
only need perform the analysis for r/ = u. Let Ai_\ denote the norm, for the exact uncompressed 
measurement, of the component where the recursive measurement scheme has not stopped before 
step £. In addition, let Eg-i denote the cumulative error before step £. Then the increment in the 
error is bound by e times the norm of the non-terminated component, which is bound by A^_\ plus 
the cumulative error. 

Ei < E t _ x + 0(eA £ _! + sE t _ x ) 

= E^ l [l + 0(£)]+0(eA^ l ). (65) 

Multiplying both sides by [1 + 0(e)] K , we obtain 

E t \\ + 0(e)] K ^ < E^[l + 0(e)] K -^ + 0(^_i[l + 0(e)]*"'). (66) 
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As a result, the final error is bound by 



K 

E K+1 <Y J 0{eA^ l [l + 0{e)] K - 1 ) 
1=1 

K 

<[l + 0( E )]^0(d W ) 

1=1 

< 0(exp(sK)e((\h\) + 1) log to). (67) 

Here the expectation value of the number of ones is for the exact scheme, which is O(l), and we 
therefore find that the error is bound by 0(exp(eK)e log to). Recall that we take K = 0(k' log to). 
This means that, provided e = 0(l/(k' log to)), exp(eK) is O(l), and we obtain scaling of the error 
of O(elogm). Adding 0(e' + e) to take account of error sources 1 and 2 yields the result given in 
the Theorem. □ 

Given the conditions of this Theorem, the overall error for each time step is O(e'-r-elogm). This 
includes error in simulating the driving Hamiltonian. The driving Hamiltonian may be applied up 
to k' times, though the expected number of times is 0(1). As the allowable error in the driving 
Hamiltonian is O(e'), that gives a contribution of O(e') to the error in each time step. As there are 
0(T) time steps, the total error is 0(e'T+eT log to). To limit the error of the overall scheme to etot, 
we take e' = 0(e tot /T) and e = 0(e tot /(Tlogm)). Then k' = 0(log(l/e')) = 0(log(T/e to t))- As we 
consider large T and small etot, we therefore have e = 0(l/[log(T/e to t) log m]) = 0(l/(k' logm)). 
This means that the condition of the Theorem is satisfied with this choice of parameters, and the 
total error will be bounded by etot- 



6 Proof of Main Theorem 

Finally we are in a position to prove Theorem 

Proof, of Theorem 13.21 First, the number of oracle queries is 0(k'T), because we have divided 
the simulation into 0(T) time intervals, and limit the number of queries required within each time 
interval to 0(k'). The value of k' is chosen to ensure that the error due to omitting high Hamming- 
weight states O(l) times within each time interval is no more than e' . We can bound the total 
error by etot if we take s' = 0(et t/T), which means that k' scales as 

Vloglog(T/e, ot ); 
Then the overall number of oracle calls scales as 

( riog(r/ £tat ) \ 

U Uglog(T/ £ tot)J- 

Omitting the dependence on etot gives the result given in the statement of the Theorem. 

Next we discuss the number of gates required for Alg. 15.21 The maximum number of steps in 
the recursive procedure is 1 + 2/c'logTO, but the expected number of steps is O(logTO). For the full 
algorithm for the evolution over time T, there are many of these recursive measurements, and the 
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probability of the average number of steps differing significantly from its expected value is small. 
Similarly to the analysis in Section |4~T| an upper bound of O(l/etot) times the average value will not 
be exceeded with probability 1 — 0(e tot ). As e t ot is taken to be constant, this does not affect the final 
result. Because U n and Un are performed at each step, these operations are performed O(logm) 
times. As was found above, the complexity of the operation U n is 0(k[\ogm + log log(l/e)]). 
Therefore the overall complexity for this time step is 0(A;[(logm) 2 + log m log log(l/e)]). 

It is also necessary to perform 0(k') time evolutions under the driving Hamiltonian. In the 
definition of the problem we let G be the number of gates required for the simulation of the driving 
Hamiltonian, so that the number of gates to simulate the driving Hamiltonian in this time step is 
0(k'G). Therefore, the scaling for the total number of gates is 

0(TGk +Tk[{\ogmf + logmloglog(l/e)]) . (70) 

Next we use e' = 0(et t/T) and e = O (etot /(T log m)). As discussed in Section [4761 we can take 
logm = 0(log(\\H\\T/e to t))- Considering the scaling with large \\H\\, the total number of gates 
simplifies to 

( TG\og{T/e tot ) riog[(riogm)/ gtot ] 2 \ 

Vloglog(T/ £tot ) + loglog[(Tlogm)/e tot ] 1 g ) )' 1 ' 

A further simplification may be obtained by ignoring the double-log factors in the denominators, 
and then using the scaling of log m to give 

O {TGlog(T/e tot ) + r[log(r/e tot ) + log log ||tf[|][log(T/ etot ) + log \\H\\f) . (72) 

The number of gates can then be bounded in a simpler but looser form as 

O (TGlog(T/e tot ) + T[log(||#||T/ £tot )] 3 ) . (73) 

Omitting e tot , because we take this quantity to be constant, gives the scaling in the Theorem. 

The number of qubits required for the algorithm is dominated by the number of qubits re- 
quired for the recursive measurement scheme. The number of qubits used for the ancilla space is 
0(k\logm + loglog(l/e)]). In the recursive measurement scheme it may be necessary to duplicate 
the ancilla space k! times to ensure that a maximum of k' ones are detected. The overall space 
used is therefore 

/ log[(riogm)/e tot ] log(r/e tot ) \ 

° 1 — i — iFF] 77 Ti — i — rfl 7 log m + log log T log m £ ^t)\ ■ (74) 

Vloglog[(Tlogm)/e to t] log log(T/e tot ) J 

Cancelling the double-log, then omitting double-log factors in the denominator gives 

O (log[(T log m)/e tot ] log(T/e tot ) log m) . (75) 
Using the scaling of m then gives 

O (log(r/ £tot )[log(T/ £tot ) + log log \\H\\] log(||F||T/ etot )) . (76) 
A simpler bound can be given as 

0([log(||#||T/ £tot )] 3 ). (77) 

Again omitting etot gives the scaling in the statement of the Theorem. 

Note also that the allowable error in the driving Hamiltonian is O(e'), which is 0(et t/T). For 
constant £tot> the allowable error in the implementation of the driving Hamiltonian is 0(1/T), as 
given in the statement of the Theorem. □ 



25 



7 Conclusions 



We have shown that any continuous-time query algorithm of cost T can be implemented with a 
number of discrete queries close to linear in T, and with a number of gates that is also close to 
linear in T. This means that any continuous-time quantum algorithm can be converted into an 
efficient discrete-query algorithm. In contrast, using the algorithm of Ref. |2J directly would result 
in a number of gates that is linear in mT. That is, the gate complexity would be superlinear in 
||-ff||T, and similar to what would be obtained just using product formulae. 

The methods we have presented may also be used as an alternative to product formulae when 
simulating state evolution for a sum of Hamiltonians, where one Hamiltonian is self-inverse, and the 
other has large norm, [|JET[|. Previous work has considered the complexity of Hamiltonian simulation 
via product formulae where one Hamiltonian has much larger norm [10] . Even using that approach, 
the complexity is only reduced from 0(||F||T(||F||T/£ tot ) <5 ) to 0(||F||T(T/£ tot ) <5 ). In comparison, 
here we have obtained complexity that is polylogarithmic in 
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